Skip to content

Conversation

@luca-della-vedova
Copy link
Collaborator

This PR revives #262 from @nnmm (as evident from the first commit that actually does 95% of the work!), with a few minor updates:

  • Solve whatever merge conflict from 3 years ago and bring up to date with latest supported ROS distros.
  • Migrate to the new pub/sub async workers APIs.
  • Make some of the public API a bit more robust (i.e. introduce a struct for message types rather than having to rely on strings).
  • Add tests for basic pub / sub behavior to validate that this is actually working.

I'll admit that most of the code under field_access (where the casting and reinterpreting of arbitrary messages is done) is a bit out of my comfort zone, however loopback tests, message serialization and deserialization seem to work so it is at least doing something sensible.

Following the initial implementation this is all under a dyn_msg feature flag, but I'm not 100% convinced that the additional [cfg(feature = "dyn_msg")] directives that are scattered throughout the codebase are worth it. Specifically on my (very overpowered) machine a clean build with dyn_msg enabled builds a total of 93 packages while a clean build without the feature builds 89 packages. The compile time difference itself is hard to detect.

Originally, the PR was split into a separate one that only included the message structure. I'm happy to split this into smaller chunks if that makes it easier to review but I thought I would show what a full implementation looks like first.

This only tackles messages, publishers and subscribers. Services (and actions) are left as a future implementation.

nnmm and others added 30 commits November 3, 2022 14:35
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
@TonyWelte
Copy link

TonyWelte commented Aug 5, 2025

I've started using this pull request in one of my projects and I have noticed an issue with messages containing sequences of messages (for instance test_msgs/msg/UnboundedSequences).

The program panics when trying to access an empty sequence of messages. To reproduce:

let message_type = MessageTypeName {
    package_name: "test_msgs".to_owned(),
    type_name: "UnboundedSequences".to_owned(),
};
let mut msg = DynamicMessage::new(message_type).unwrap();

let value = msg.get("basic_types_values");

In dynamic_sequences.rs l.200 std::slice::from_raw_parts is called with a null pointer which triggers the panic.

@mxgrey
Copy link
Collaborator

mxgrey commented Aug 6, 2025

In dynamic_sequences.rs l.200 std::slice::from_raw_parts is called with a null pointer which triggers the panic.

This reminds me of #407.

Signed-off-by: Luca Della Vedova <[email protected]>
@luca-della-vedova
Copy link
Collaborator Author

Hello, hope this is working well for you!

I fixed the panic for both mutable and non mutable access and added (+ expanded) your snippet as a regression test in e21a4a7, let me know if it works for you :)

@TonyWelte
Copy link

Thank you for the quick fix. It's working now

@esteve
Copy link
Collaborator

esteve commented Nov 12, 2025

@luca-della-vedova I've fixed the remaining conflict, but let's see if CI works. I'm happy to take it from here, I used to have a branch with Nikolai's changes fleshed out and would like to get this merged for the next release.

@esteve
Copy link
Collaborator

esteve commented Nov 12, 2025

@luca-della-vedova BTW, I'll get rid of the dyn_msg feature flag and add this feature as part of the normal codebase.

@luca-della-vedova
Copy link
Collaborator Author

@esteve I saw that you opened #551 and I have to admit I'm not really happy with this way to proceed.

In my view, the process for contributions should be:

  • An author opens a PR.
  • Reviews are submitted, the owner addresses them.
  • If the reviewer believes writing a review is more work than doing the change themselves, open a PR against the author's fork.
  • After iteration, PR is merged.
  • If the author becomes unresponsive or the PR risks becoming stale, another maintainer steps in and forks the PR (what you did).

Since this PR was opened about 5 months ago I only received one comment about a panic and addressed it within 24h, so in my opinion forking the PR is not really justified right now.
I also looked at your PR and the changes seem to mainly be:

  • Removing the dyn_msg feature. I brought up in the first message of the PR that I felt this would have been the right move and I was waiting for a maintainer to give me a green light, just in case the dependency load was a major concern. Raising that comment would have been enough rather than forking.
  • Merging main and resolving conflicts. That's trivial but I can only do it if the PR is being reviewed, it's been 5 months of silence and I merged main + solve conflicts twice but there is no point doing it if no reviews are coming. A simple "Can you merge main so we can get CI running and iterate on this PR?" would have been sufficient.
  • Disabling the library cache feature. This is an optimization I introduced. I'm happy to revert it if maintainers feel it's not a good idea. Is it unnecessary? Overengineered? Too hacky? Buggy? Again all is valid feedback but forking the PR and silently removing the change is, in my opinion, not the right approach.

@esteve
Copy link
Collaborator

esteve commented Nov 13, 2025

@luca-della-vedova sorry, I didn't mean to offend you, I really appreciate all the work you did. I opened the other PR to trigger the CI because I couldn't push here, and then got so into it that started hacking on stuff to get the CI passing, but I'm happy to submit a PR to your branch and continue the discussion here. Again, sorry this caused any bad feelings.

@esteve
Copy link
Collaborator

esteve commented Nov 13, 2025

@luca-della-vedova I think the two first points are addressed/discussed and that we're in agreement.

Regarding the third point, it's not that it's hacky or overengineered, I think it's useful, but lazy_cell, which LazyLock depends on, was only made stable in Rust 1.80 and since CI is targeting 1.75, I reverted to get the CI to pass.

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 13, 2025

lazy_cell, which LazyLock depends on, was only made stable in Rust 1.80

If this is the only blocker, it should be possible to replace all uses of LazyLock with OnceLock. The only downside is that it requires some extra syntax. Instead of

pub static DYNAMIC_MESSAGE_PACKAGE_CACHE: LazyLock<Mutex<DynamicMessageLibraryCache>> =
    LazyLock::new(|| Default::default());

we would need to have

pub fn get_dynamic_message_package_cache() -> &'static Mutex<DynamicMessageLibraryCache> {
    static DYNAMIC_MESSAGE_PACKAGE_CACHE: OnceLock<Mutex<DynamicMessageLibraryCache>> = OnceLock::new();
    DYNAMIC_MESSAGE_PACKAGE_CACHE.get_or_init(|| Default::default())
}

Then wherever we would've called

DYNAMIC_MESSAGE_PACKAGE_CACHE.lock().unwrap()

we instead call

get_dynamic_message_package_cache().lock().unwrap()

@esteve
Copy link
Collaborator

esteve commented Nov 13, 2025

@mxgrey that works for me. @luca-della-vedova thoughts?

@luca-della-vedova
Copy link
Collaborator Author

That's odd, I enabled the "allow edits from maintainers", and it seems that a few weeks ago you managed to push a "merge to main" into this branch (8cf5ce0)?
Anyway, no worries I knew there was no bad intention but thought I'd just share openly how I felt about it, even though as I mentioned Nikolai did 95% of the work reviving this was not a simple endeavor so some personal feelings got in the way!
WRT the changes if that's OK I'll cherry-pick them here this time for simplicity instead of going through a whole process of PRs that maybe don't run CI etc. I'll have to double check some parts though (a test change looked a bit suspicious but I haven't had time to look more deeply, as well as the OnceLock issue), I'll try to get this done next week. It's been some time since I wrote it and it seems that in my last commit CI including the "Rust Minimal" was fully green but to be honest now that you mention that LazyLock was not stabilized before 1.80 I can't really say I know how it is possible that it was green, it might be that the CI was not testing the right features in the first place.

@esteve
Copy link
Collaborator

esteve commented Nov 14, 2025

@luca-della-vedova

That's odd, I enabled the "allow edits from maintainers", and it seems that a few weeks ago you managed to push a "merge to main" into this branch (8cf5ce0)?

I merged that two days ago via the website. But now I realize what happened, I checked out your branch before I did the merge with main, clicked merge and then the branches diverged. And instead of actually reading the error message when I did git push, I just assumed you hadn't enable edits from maintainers. Which is absurd, because I guess I wouldn't have been able to merge with main in the first case 🤦

Anyway, this was an example of Hanlon's razor, so in the end it was my fault for not carefully reading the error message from GitHub, but there was no malice at all 🙂

WRT the changes if that's OK I'll cherry-pick them here this time for simplicity instead of going through a whole process of PRs that maybe don't run CI etc.

Sounds good to me, whatever is easier to you.

It's been some time since I wrote it and it seems that in my last commit CI including the "Rust Minimal" was fully green but to be honest now that you mention that LazyLock was not stabilized before 1.80 I can't really say I know how it is possible that it was green, it might be that the CI was not testing the right features in the first place.

I saw it in the logs of the other PR, and then found about LazyLock being stable in 1.80 in https://blog.rust-lang.org/2024/07/25/Rust-1.80.0/#what-s-in-1-80-0-stable The solution that @mxgrey proposes based on OnceLock seems equivalent.

Signed-off-by: Luca Della Vedova <[email protected]>
@luca-della-vedova
Copy link
Collaborator Author

I think I found out why I didn't see the issue with 1.75. I use a newer version locally and the CI isn't actually doing anything.
Specifically, a successful run https://github.com/ros2-rust/ros2_rust/actions/runs/19299045638/job/55188284077 shows that the "run cargo test on rust packages" action actually doesn't do anything, run cd ros_ws then exits instantly.
Now because this PR removes the dyn_msg feature we should be fine but we should figure out the CI issue since, if I understand correctly, we are not actually testing the non-default features of rclrs (after this PR serde).

Signed-off-by: Luca Della Vedova <[email protected]>
Signed-off-by: Luca Della Vedova <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants